Journal article

Propagation, detection and correction of errors using the sequence database network

B Goudey, N Geard, K Verspoor, J Zobel

Briefings in Bioinformatics | Published : 2022

Abstract

Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within an..

View full abstract